Fine keyword clustering using a thesaurus and example sentences for speech translation

نویسندگان

  • Yumi Wakita
  • Kenji Matsui
  • Yoshinori Sagisaka
چکیده

For robust speech translation, we propose a new language translation method in which speech recognition results are mapped to example sentences using keywords. In this method, the keyword clustering is used to cope with recognition errors and the wide variety of words that do not appear in the training corpus. Initial classes defined using only thesaurus are redefined by using the dependency between the keywords in limited number of example sentences. The effectiveness of our keyword clustering method is confirmed through example sentence search experiments. These experiments were done using keyword sets of (a) different sentences including keywords not in the example sentences and (b) recognition results those sentences in which recognition errors were obtained. Compared with the search method which uses keyword sets defined by using only a thesaurus, our proposed method offered improved search error rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Experimental Multilingual Bi-directional Speech Translation System

We describe an experimental Multilingual Bi-directional speech translation system utilizing small, PC-based hardware with multi-modal user interface. Two major problems for people using an automatic speech translation device are speech recognition errors and language translation errors. We focus on developing techniques to overcome these problems. The techniques include a new language translati...

متن کامل

Recent Advances in Example - Based Machine Translation

This book, an outcome of a 2001 workshop on Example-Based Machine Translation (EBMT) in Santiago de Compostela, very appropriately starts with a preface by professor Makoto Nagao in which he explains how the limits of rule-based Machine Translation (MT) led him to propose his translation by analogy principle in 1981 (published as Nagao, 1984). His idea, inspired by second language learning meth...

متن کامل

CLEF-2005 CL-SR at Maryland: Document and Query Expansion using Side Collections and Thesauri

This paper reports results for the University of Maryland’s participation in CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) ...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

A Part-of-Speech-Based Search Algorithm for Translation Memories

The retrieval of related sentences in state-of-the-art translation memory systems is based on orthographic similarities. This often leads to poor search results, since orthographically similar sentences are not necessarily semantically related. In this paper we propose a search algorithm that aims to reduce this problem by taking part-of-speech information into account. It requires that the par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000